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Overview Letina, 





= Lifecycle approach to Autonomous Vehicle safety 
e Historically we assume perfectly safe production release 
e Need move to lifecycle adaptation model 


— Operational metrics used as basis for 
continuous improvement 


= Safety Performance Indicators (SPIs) 
e Beyond “vehicle is acting unsafely” 
e Beyond dynamic risk management 
e Beyond run-time safety monitors 





e ANSI/UL 4600 SPIs monitor safety case soundness 
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Big Changes In Safety Engineering for AVS Ukitniy 
= Conventional software safety engineering 
e Do hazard and risk analysis (e.g., ISO 26262) 

e Mitigate hazards; achieve acceptable risk 


e Assume “perfect” for safety when deployed 
— Human driver intervention to clean up loose ends 


= Autonomous system safety is about change 
e Machine learning-based validation is immature 
e Open, imperfectly understood environment 
— Unknown unknowns, gaps in requirements, etc. 
— Keep up with a constantly evolving real world vince’ Tartan Rescue's 
. . 5 University CHIMP in 2015 
e System monitoring > safety/security updates 


© 2022 Philip Koopman 3 


4 


sa — 


https://goo"g|/dBdSDM 





Safety Engineering: Hazards & Risks Nelo 
m= Hazard and Risk Analysis for conventional systems 
e List all applicable hazards 
e Characterize the resultant risk HAZARD 
e Mitigate risk as needed, e.g., update design ANALYSIS ee 
e Iterate until all risks acceptably mitigated 
= Use various techniques to create hazard list 
e Lessons learned from previous projects; industry standards 
e Brainstorming & analysis techniques 
— FMEA, Fault Trees, HAZOP., .... bring your own favorite approach ... 
= Presumption all hazards covered before deployment 
e Fully characterized operating environment 
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Hazard Analysis for Novel, Open World Systems —thiveisivy 
= Operating in the open world 

e All hazards aren't known at first 


e Test, test, test until you have 
uncovered enough hazards 


= Safety Of The Intended Function (SOTIF) 
e Operate in the real world 
e Unknowns manifest “triggering events” (ISO 21448 terminology) 
e Mitigate newly discovered hazards caused by triggering events 
e Repeat until you stop seeing triggering events 

= Limitation: residual unknown unknowns (requirements gaps) 
e Hypothesize you can find enough of the unknowns 
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Driver Assistance Feedback Model aie 
= Driver does dynamic risk mitigation 
= Useful fiction: systems safe forever when released 
e Driver expected to help mitigate risks & surprises 


e Recalls for defects drivers can’t handle — not supposed to happen 


DRIVER 
EXPERIENCE 








HAZARD 
ANALYSIS 
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TRIGGERING 
EVENTS 
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Reaction To Incidents and Loss Events ee 





= Conventional systems (in practice) too often: 
e Ignore if not reproducible 
e Blame it on the operator 
e Educate operators on workarounds 
e Try again to blame it on the operator 
e VERY reluctantly do a software update = Oi, ek ei a 
= This persists across domains: ee ee es 
e Power imbalance between victims and ion designers 
e Normalization of #MoralCrumpleZone strategies [nttps://pit.ty/3qx2D92] 
e Poor adoption of software engineering practices 
e The fact that the feedback loop is called a “recall” 
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How Is The Recall Approach Working Out? —"*",, 


= Small sampling of NHTSA recalls (confirmed defects) 
22V-169 and many others: Backup camera & display failures 

21V-972: Parking lock system error leads to vans rolling away when parked 
21V-873 and MANY others: Airbags disabled 

21V-846: Phantom braking due to inconsistent software state after power up 
21V-109: Battery controller reset disconnects electric drive motor power 
20V-748: Improper fail-safe logic degrades brake performance 

20V-771: Malfunctions of wipers, windows, lights, etc. due to comms failure 
20V-557 and others: Airbags deploy too forcefully or when they should not 
17V-713: Engine does not reduce power due to ESP software defect 
15V-569: Unexpected steering motion causes loss of control 

15V-145: Unattended vehicle starts engine > carbon monoxide poisoning 


See: https://betterembsw.blogspot.com/p/potentially-deadly-automotive-software.html 
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Autonomous Vehicles Are Even Worse ae 
= Machine Learning (ML) only learns things it has seen 
e Learns by example 
e Can be brittle; generalization is limited 
e Spectacular failures for the unexpected 
one 


= ML complicates safety engineering 
e Safety engineering assumes “V” model 
e Prone to brittleness to unexpected data variations 
e Were there biases or gaps in training data? 
e Assurance for rare objects and events in the real world? 
— Safety tends to be limited by rare, high-consequence events 
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Unusual road obstacles & conditions 
Strange behaviors 
Subtle clues 
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The Real World: Heavy Tail Distribution =", 


Common Things Edge Cases 


: Seen In ua ! Not Seen In Testing 


Random Independent Arrival Rate (exponential) 
r Law Arrival Rate (80/20 rule) 


(Heavy Tail Distribution) 













Many Different, 
Infrequent Scenarios 
Total Area is the same! 





TOTAL TESTING TIME —————_> 


© 2022 Philip Koopman 11 


Carnegie 


University 


= Where will you be after 1 Billion miles of testing? 
e At 100M miles per fatality, need perhaps 1 billion miles 
= Assume 1 Million miles between unsafe “surprises” 
e Example #1: 
100 “surprises” @ 100M miles / surprise 
e Example #2: 
100,000 “surprises” @ 100B miles / surprise 
— Only 1% of surprises seen during 1B mile testing 
— SOTIF fixes of triggering events don't really help 
= “Perfect when deployed” no longer a useful fiction 
e We're going to need feedback measurements from deployment 
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Which Metrics Should We Use? Nelo 
= Key Performance Indicator (KPI) approach is typical: 
e Deviation from intended vehicle path 
Ride smoothness 
Hard braking incidents 
Disengagements during testing 
Coverage of defined scenario catalog 
e Risk metrics such as Time to Collision 
= But how do we predict operational safety? 
e Are KPis good leading metrics for loss events? yo 
e Does a particular KPI set cover all aspects of safety? 
e How can we select KPIs for traceability to safety? 
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SPI (per ANSI/UL 4600): 
e Measurement used to 
measure or predict safety 
Lagging SPI metrics (how it turned out): 
e Arrival rate of adverse events 
compared to a risk budget 
—- Example: Loss events (crashes) per hour 
e Incidents (could have been a loss event) 
— Example: running a red light, wrong lane direction 
Also need leading metrics to predict safety 
e We can do that by linking to a safety case 
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Safety Cases for Autonomous Vehicles i", 





= Claim — a property of the system 
e “System avoids hitting pedestrians” 

= Argument — why this is true 
e “Detect & maneuver to avoid” 

m= Evidence — supports argument 
e Tests, analysis, simulations, ... 

= Sub-claims/arguments address 
complexity 
e “Detects pedestrians” // evidence 
e “Maneuvers around detected pedestrians” // evidence 
e “Stops if can’t maneuver” // evidence 








ARGUMENT 1 


EVIDENCE 1 


ARGUMENT 2 







eee 

Sub-CLAIM 2B 
) 

Sub-ARGUMENT 2B 


EVIDENCE 2B 


Sub-CLAIM 2A 
~) 
Sub-ARGUMENT 2A 


EVIDENCE 2A 
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SPis Instrument a Safety Case University 





= SPIs monitor the validity of safety case claims 
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Example SPlis Mellon 
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m= System Level SPls: 
e Road test incidents caught by safety driver in testing 
e Simulator (SIL/HIL) incidents 
m= Subsystem SPIs: 
e Vehicle Controls: compromised vehicle stability 
e Path Planning: insufficient clearance to object .\\\\ 1" 1) 
e Perception: false negative (non-detection) \ 
e Prediction: unexpected object behavior S 
m Lifecycle SPls: 
e Maintenance errors 
e Invalid configuration installed 
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= An SPI is a metric supported by evidence that uses a 
threshold comparison to condition a safety case claim. 
e Metric: measurement of performance, design quality, process 
quality, operational procedure conformance, etc. 
e Threshold: acceptance test on metric value 
— Often statistical (e.g., fewer than X events per billion miles) 
e Evidence: data used to compute the metric 
e Condition a claim: threshold violation falsifies a specific claim 
— Argument for claim is (potentially) proven false by SPI 
e Anything that does not meet all criteria is a KPI, not an SPI 
= SPI violation: part of a safety case has been falsified 
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SPlis and Lifecycle Feedback Unger 





= SPI: direct measurement of claim failure 
e Independent of reasoning (“claim is X ... yet here is ~X) 
e Partial measurement(s) OK; multiple SPls for a claim OK 
= A falsified safety case claim: 
e Not (necessarily) imminent loss event 
e Safety case has some defect ob) Soe 
= Root cause analysis might reveal: 
e Product or process defect 
e Invalid safety argument 
e Issue with supporting evidence 
e Assumption error, ... 








Sub-ARGUMENT 1A 
° 
EVIDENCE 1A 
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SPI-Based Feedback Approach hats 





m Safety Case argues acceptable risk 
e SPIs monitor validity of safety case 


SOTIF 


gene TRIGGERING EVENTS 


ANALYSIS 


RUN-TIME 
SAFETY 
MONITOR 
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SPlis Go Beyond Overt Dangerous Behavior = ti!" 





m “Acts dangerously’ is only one dimension of SPIs 
e Violation rate of pedestrian buffer zones 
e Time spent closer than safe following distance 
= Components meet safety related requirements 
e False negative/positive detection rates 
e Correlated multi-sensor failure rates 
= Design & Lifecycle considerations 
e Design process quality defect rates 
e Maintenance & inspection defect rates 
= Is it relevant to safety? =} Safety Case =} SPls 








© 2022 Philip Koopman 21 


Carnegie 
Mellon 
University 


Functionality (KPIs): 

e Are all the features implemented? 

e Does each feature work as intended? 

e Is testing progress on track per schedule? 
Runtime safety monitors: 

e Triggers risk reduction during run time 
Safety Feedback (SPIs): 

e Did runtime safety monitor miss something? 
e Are there dangerous gaps in the Operational Design Domain? 
e Are there problems with requirements, design, upkeep, etc.? 
e Are there dangerous gaps in fault responses? 
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Following Distance Example Mellon 


University 
= Responsibility-Sensitive Safety (RSS) Scenario: 








— An ax,accel 
<—. V; = Amax,brake > << -\V. Annin,brake 














e KPI: is average following distance appropriate for driving conditions 
e Runtime monitor: force an increase of following distance if too close 
e SPls: situation more dangerous than expected (e.g., ODD issues) 

— Spent more time in too-dense traffic than expected 


—- Lead/own vehicle brake violate expectations (too often; too aggressive) 
— Spent too long to recover from lead vehicle cut-in CO nicole tes 
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Sketch of an AV Safety Argument Mellon 


University 


AV is safe enough to deploy because: 
= We've followed industry safety standards 
e ISO 26262, ISO 21448, ANSI/UL 4600, ... 
e Safety culture is robust 
= Known hazards have been mitigated 
e Residual risk is acceptable at system level 
= Arrival rate of unknowns is low 
e Incidents which do not trigger runtime safing 
m Safety case has good SPI coverage 
e SPls usually detect unknowns without an actual crash 
e System is fixed to mitigate unknowns before likely reoccurrence 
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Conclusions University 


= Removing human drivers makes safety much harder 
e Tactical: run-time safety monitoring in vehicle 





e Strategic: SPI monitoring across fleet 
e Field feedback as lifecycle adaptation 
= SPIs predict and monitor system safety 
e KPls: “how well do we drive?” 
e SPls: “how often are safety claims falsified?” 
e SPls can detect safety problems with no crash 
= SPls: are you as Safe as you think you are? 
e See ANSI/UL 4600 Chapter 16 for SPI guidance 
e Field feedback via SPls provides lifecycle safety adaptation 
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